-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental] Add a RuntimeOption to set inter and intra op threadpool sizes #410
base: master
Are you sure you want to change the base?
Conversation
@@ -93,7 +93,7 @@ void check_tf_status(const tensorflow::Status &status) | |||
} | |||
|
|||
// Get TF session options given Neuropod RuntimeOptions | |||
tensorflow::SessionOptions get_tf_opts(const RuntimeOptions & /*unused*/) | |||
tensorflow::SessionOptions get_tf_opts(const RuntimeOptions &runtime_opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related. Some thoughts. RuntimeOption is used now in C++, C and Java API (and could be Python too). We need to keep it in sync. I have seen that Tensorflow using Proto declaration in such cases and then generates struct for languages.
We may use similar approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - definitely something to look into.
I don't like TF's approach with options for their C API though. They basically require a buffer with a serialized proto as input. This makes it fairly complicated to set options directly from C.
c9366b5
to
7387010
Compare
7387010
to
b31a85a
Compare
b31a85a
to
726d6fb
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #410 +/- ##
==========================================
- Coverage 88.04% 87.83% -0.22%
==========================================
Files 106 106
Lines 6893 6928 +35
==========================================
+ Hits 6069 6085 +16
- Misses 824 843 +19 ☔ View full report in Codecov by Sentry. |
// See https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html#runtime-api | ||
if (options.experimental_inter_op_parallelism_threads != 0) | ||
{ | ||
at::set_num_interop_threads(static_cast<int32_t>(options.experimental_inter_op_parallelism_threads)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see, there is a torch::set_num_interop_threads that is supposed to be "public". Minor, but I think it is issue still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same about other.
|
||
if (options.experimental_intra_op_parallelism_threads != 0) | ||
{ | ||
at::set_num_threads(static_cast<int32_t>(options.experimental_intra_op_parallelism_threads)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to give access to get_num_* somehow. User sets runtime (or not), then starts neuropod execution and should be able to check what are the settings are used, it is important for "default" case, when system sets value and also for IPE case where models share it.
#if CAFFE2_NIGHTLY_VERSION >= 20190808 | ||
// Set intra and inter op parallelism | ||
// See https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html#runtime-api | ||
if (options.experimental_inter_op_parallelism_threads != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Interop case, it allows to set several times only for TBB case. For our case, if there are 2nd IPE model with non-zero value, it will fail. As I see in code, for non-TBB cases, it sets atomic var and I think it can be done here as well. I'd save still possibility to change it for TBB case. I am thinking about building libtorch for TBB, we have a Torchscript model/use case where TBB's "better" concurrency can be critical.
Summary:
Allow users to configure intra-op and inter-op thread pool sizes for the underlying frameworks.
Note: This API is experimental and may change in the future.
Test Plan: